An Efficient Way to Learn Rules for Grapheme-to-Phoneme Conversion in Chinese

نویسندگان

  • Zi-rong ZHANG
  • Min CHU
  • Eric CHANG
چکیده

Grapheme-to-phoneme (G2P) conversion is a very important component in a Text-to-Speech (TTS) system. Determining the pronunciation of polyphone characters is the main problem that the G2P component in a Mandarin TTS system faces. By studying the distribution of polyphones and their characteristics in a large text corpus with corrected pinyin transcriptions, this paper points out that correct G2P conversion for 41 key polyphones and 22 key polyphonic multi-syllabic words will constrain the overall error rate to below 0.068%. In this paper, the Extended Stochastic Complexity based stochastic decision list is used to learn rules for G2P conversion for these key polyphones and polyphonic words. With the generated rules, the error rate for G2P conversion decreased from 0.88% to 0.44%. Tagging corpus with correct pinyin for training and testing rules is a labor consuming and time consuming task. This paper also proposes a semi-automatic approach to do this, which saves almost half of the workload.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Language - Independent , Data - OrientedArchitecture for Grapheme - to

We report on an implemented grapheme-to-phoneme conversion architecture. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language which takes as its input the spelling of words, and produces as its output the phonetic transcription according to the rules implicit in t...

متن کامل

Dialect variation in Boro Language and Grapheme-to-Phoneme conversion rules to handle lexical lookup fails in Boro TTS System

It is not possible to include all the words in a natural language for general text-to-speech system. Grapheme-tophoneme conversion system is essential to pronounce a word which is out of vocabulary. Grapheme-to-phoneme rules play a vital role where lexical lookup fails. Though basic Grapheme-tophoneme rules system is very simple yet it is very powerful for naturalness of a TTS system. Letter-to...

متن کامل

An Efficient Way to Learnenglish Grapheme - to - Phoneme Rules

We present an eecient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly deened speciicity hierarchy. As the hierarchy, we have used the amount of expanding context around the symbol to be transformed , weighted towards the right. To...

متن کامل

Learning from errors in grapheme-to-phoneme conversion

In speech technology it is very important to have a system capable of accurately performing grapheme-to-phoneme (G2P) conversion, which is not an easy task especially if talking about languages like English where there is no obvious letter-phone correspondence. Manual rules so widely used before are now leaving the way open for the machine learning techniques and language independent tools. In ...

متن کامل

Grapheme-to-Phoneme conversion, a knowledge-based approach

This paper reflects the results of an ongoing project at Högskolan i Skövde, aimed at the creation of a system for grapheme-to-phoneme conversion for Swedish, from a knowledge-based approach. The focus lies on development and implementation of an algorithm for parsing ortographic text, and phonetic rules for the transcription.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002